Introduction

We have built a simulation environment for haplodiploid simulations. As a proof-of-concept, we compare simple simulations of haplodiploid populations with simulations of didplodiploid populations.

In each simulation, we create a population with 1000 males and 1000 females. Each individual has a genome with 1,000,000 loci. At each generation, each individual produces gametes (with a mutation rate of \(1\times10^{-8}\) and recombination of \(\times10^{-5}\)) and mates with another individual of the opposite sex. The alleles produced by mutation have a given selective coefficient, which we change depending on the simulation condition. The inheritance of the genome onto the next generation is based on the sum of the selective coefficients of the alleles carried by each individual. In the haplodiploid simulation, males are haploid, produced by the non-fertilised gametes of diploid females. In diploid simulations, both males and females are diploid, and there is no sex chromosome.

We ran the simulations where we varied the selection coefficient of new mutations. We ran the following simulations:

Each simulation was ran for 40,000 generations after a run-in of 10,000 generations. For each simulation, we recorded the number of novel alleles that have become fixed (cumulatively) every 100th generation.

Simulations with neutral mutations

Haplodiploid populations have a lower effective population size than diploid populations. This means that there will be fewer neutral mutations emerging in a haplodiploid population than in a diploid population with the same number of individuals, but that each mutation has a higher chance of becoming fixed. These two processes are expected to balance out, so that there is no difference in substitution rate between populations.

An essential point is that the fixation rate of neutral mutations is equal to the mutation rate. For diploids, the fixation probability of a new mutation is \(1/(2N)\) (as it is present in that proportion when it appears) and the number of new mutations per generation is \(2N\mu\), so that the fixation rate (fixation probability \(\times\) mutation rate per generation) is:

\(1/(2N) \times 2N\mu = \mu\)

For haplodiploids, the same is true:

\(1/(1.5N) \times 1.5N\mu = \mu\)

In our simulations, we have a mutation rate of \(1\times10^{-7}\) and a genome with \(100,000\) loci, giving us an expectation of 0.01 mutations per generation. If we run our simulations for a total of 5^{4} generations, we would expect a total of 500 fixed mutations.

Let us start by using a burnin period of 10,000 generations.

We measure the fixation rate by counting how many mutations have become fixed at the end of the simulation.

As expected, the average number of fixed alleles per generation is approximately 0.01 for both ploidy types.

##   selection haplodiploid diploid haplodiploid_rate diploid_rate rate_difference
## 1    s = 0%      397.035  389.82       0.009925875    0.0097455     0.000180375
##   difference_per
## 1             2%

However, the diploid simulations seem to have slightly fewer fixed mutations than the diploid simulations.

##   dominance selection statistic     p.value
## 1    h = 0%    s = 0%     16405 0.001873824
##                                              method alternative
## 1 Wilcoxon rank sum test with continuity correction   two.sided

Choose correct burn-in period

In simulations, it can take many generations for drift and mutation to balance. This is likely to explain the slightly smaller number of fixed mutations in the diploid population. This is because difference between haplodiploids and diploids is already quite large after 10000 generations, and it does not seem to increase much more in after another 40,000 generations.

We thus tested whether using larger burn-ins decreased the difference between haplodiploid and diploid simulations:

##   burnin selection haplodiploid diploid haplodiploid_rate diploid_rate
## 1  10000    s = 0%      397.035 389.820       0.009925875  0.009745500
## 2  11000    s = 0%      387.705 382.235       0.009941154  0.009800897
## 3  12500    s = 0%      373.815 369.070       0.009968400  0.009841867
## 4  15000    s = 0%      350.080 346.240       0.010002286  0.009892571
## 5  20000    s = 0%      300.305 297.515       0.010010167  0.009917167
##   rate_difference difference_per statistic     p.value
## 1    0.0001803750             2%   16405.0 0.001873824
## 2    0.0001402564             1%   17309.5 0.019962628
## 3    0.0001265333             1%   17731.0 0.049713044
## 4    0.0001097143             1%   18245.5 0.129167100
## 5    0.0000930000             1%   18680.5 0.253833745
##                                              method alternative
## 1 Wilcoxon rank sum test with continuity correction   two.sided
## 2 Wilcoxon rank sum test with continuity correction   two.sided
## 3 Wilcoxon rank sum test with continuity correction   two.sided
## 4 Wilcoxon rank sum test with continuity correction   two.sided
## 5 Wilcoxon rank sum test with continuity correction   two.sided

From these results, we conclude that using a longer burn-in period decreases the difference between haplodiploids and diploids at the last generation. We therefore take a burn-in of 15,000 hereafter.

Neutral simulations

For the neutral simulation, we expect the haplodiploid and diploid simulations to have the same fixation rate. Below, each line is a single simulation. The rate of fixation is the slope of those lines, or, in other words, the average number of fixed alleles per generation.

To measure the fixation rate, we can look at the average number of fixed alleles at the end of the simulations.

As expected, the average number of fixed alleles per generation is approximately 0.01 for both ploidy types.

##   selection haplodiploid diploid haplodiploid_rate diploid_rate rate_difference
## 1    s = 0%       350.08  346.24        0.01000229  0.009892571    0.0001097143
##   difference_per
## 1             1%

We then test if the values are normally distributed:

## 
##  Shapiro-Wilk normality test
## 
## data:  final_neutral_for_wilcox_test$diploid
## W = 0.99135, p-value = 0.2787
## 
##  Shapiro-Wilk normality test
## 
## data:  final_neutral_for_wilcox_test$haplodiploid
## W = 0.99299, p-value = 0.4592

As expected, there is no difference between fixation rate of neutral mutations in simulations of haplodiploid and diploid populations.

##   selection haplodiploid diploid haplodiploid_rate diploid_rate rate_difference
## 1    s = 0%       350.08  346.24        0.01000229  0.009892571    0.0001097143
##   difference_per statistic   p.value
## 1             1%   18245.5 0.1291671
##                                              method alternative
## 1 Wilcoxon rank sum test with continuity correction   two.sided

Simulations with advantageous mutations

For the simulation where all mutations were advantageous, we expect the haplodiploid populations to fix alleles quicker, where the males (which are haploid) are under selection as soon as the mutation appears, whereas the mutation needs to be at high enough frequency to be in homozygous individuals before it is under selection. This is obvious in the simulations with the most highly advantageous mutations (S = +10%).

Again, to measure the fixation rate, we can look at the average number of fixed alleles at the end of the simulations.

The fixation rate is higher than in the neutral simulations for all classes. As expected, haplodiploids have higher fixation rates than diploids.

##   selection haplodiploid  diploid haplodiploid_rate diploid_rate
## 1  s = 0.1%      1024.31  774.155        0.02926600   0.02211871
## 2  s = 0.3%      2131.66 1270.655        0.06090457   0.03630443
## 3    s = 1%      4353.90 2089.120        0.12439714   0.05968914
##   rate_difference difference_per
## 1     0.007147286            32%
## 2     0.024600143            68%
## 3     0.064708000           108%

The fixation rates are not normally distributed:

## # A tibble: 6 × 5
## # Groups:   ploidy, selection [6]
##   ploidy       selection statistic p.value method                     
##   <fct>        <fct>         <dbl>   <dbl> <chr>                      
## 1 diploid      s = 0.1%      0.991  0.227  Shapiro-Wilk normality test
## 2 diploid      s = 0.3%      0.995  0.802  Shapiro-Wilk normality test
## 3 diploid      s = 1%        0.992  0.390  Shapiro-Wilk normality test
## 4 haplodiploid s = 0.1%      0.982  0.0131 Shapiro-Wilk normality test
## 5 haplodiploid s = 0.3%      0.990  0.162  Shapiro-Wilk normality test
## 6 haplodiploid s = 1%        0.992  0.387  Shapiro-Wilk normality test

All comparisons are statistically significant:

##   selection statistic      p.value
## 1  s = 0.1%         0 4.787268e-67
## 2  s = 0.3%         0 4.798804e-67
## 3    s = 1%         0 4.801980e-67
##                                              method alternative
## 1 Wilcoxon rank sum test with continuity correction   two.sided
## 2 Wilcoxon rank sum test with continuity correction   two.sided
## 3 Wilcoxon rank sum test with continuity correction   two.sided

Simulations with deleterious mutations

For the deleterious mutations, we expect that the fixation rate will be lower than that of neutral or advantageous mutations. Interestingly, none of of the highly deleterious mutations (S=-10%) was fixed in the population.

We can look at the fixation rate of the other two mutation types.

The difference between haplodiploid and diploid observed for midly deleterious mutations is very large.

##    selection haplodiploid diploid haplodiploid_rate diploid_rate
## 1 s = -0.03%      197.100 217.630      5.631429e-03 6.218000e-03
## 2 s = -0.05%      127.895 147.375      3.654143e-03 4.210714e-03
## 3  s = -0.1%       37.205  45.590      1.063000e-03 1.302571e-03
## 4  s = -0.3%        0.040   0.060      1.142857e-06 1.714286e-06
## 5    s = -1%        0.000   0.000      0.000000e+00 0.000000e+00
##   rate_difference difference_per
## 1   -5.865714e-04            -9%
## 2   -5.565714e-04           -13%
## 3   -2.395714e-04           -18%
## 4   -5.714286e-07           -33%
## 5    0.000000e+00           NaN%

The number of fixed mutations is normally distributed for some of the simulations, but this is not a sufficient justification to use a t-test instead of a Wilcoxon rank sum for these simulations only:

## # A tibble: 8 × 5
## # Groups:   ploidy, selection [8]
##   ploidy       selection  statistic p.value method                     
##   <fct>        <fct>          <dbl>   <dbl> <chr>                      
## 1 diploid      s = -0.03%     0.983  0.0189 Shapiro-Wilk normality test
## 2 diploid      s = -0.05%     0.987  0.0584 Shapiro-Wilk normality test
## 3 diploid      s = -0.1%      0.996  0.907  Shapiro-Wilk normality test
## 4 diploid      s = -0.3%      0.252  0      Shapiro-Wilk normality test
## 5 haplodiploid s = -0.03%     0.996  0.835  Shapiro-Wilk normality test
## 6 haplodiploid s = -0.05%     0.986  0.0431 Shapiro-Wilk normality test
## 7 haplodiploid s = -0.1%      0.989  0.129  Shapiro-Wilk normality test
## 8 haplodiploid s = -0.3%      0.176  0      Shapiro-Wilk normality test

However, the difference observed for the weakly deleterious mutations is not significant.

##    selection haplodiploid diploid haplodiploid_rate diploid_rate
## 1 s = -0.03%      197.100 217.630      5.631429e-03 6.218000e-03
## 2 s = -0.05%      127.895 147.375      3.654143e-03 4.210714e-03
## 3  s = -0.1%       37.205  45.590      1.063000e-03 1.302571e-03
## 4  s = -0.3%        0.040   0.060      1.142857e-06 1.714286e-06
## 5    s = -1%        0.000   0.000      0.000000e+00 0.000000e+00
##   rate_difference difference_per statistic      p.value
## 1   -5.865714e-04            -9%     33275 1.584650e-30
## 2   -5.565714e-04           -13%     35436 1.119552e-40
## 3   -2.395714e-04           -18%     32678 5.066054e-28
## 4   -5.714286e-07           -33%     20494 2.466463e-01
## 5    0.000000e+00           NaN%     20000          NaN
##                                              method alternative
## 1 Wilcoxon rank sum test with continuity correction   two.sided
## 2 Wilcoxon rank sum test with continuity correction   two.sided
## 3 Wilcoxon rank sum test with continuity correction   two.sided
## 4 Wilcoxon rank sum test with continuity correction   two.sided
## 5 Wilcoxon rank sum test with continuity correction   two.sided

The simulation with more deleterious mutations have very few mutations becoming fixed.

## # A tibble: 10 × 3
## # Groups:   selection [5]
##    selection  ploidy       simulations_with_fixed_mutations
##    <fct>      <fct>                                   <int>
##  1 s = -0.03% haplodiploid                              200
##  2 s = -0.03% diploid                                   200
##  3 s = -0.05% haplodiploid                              200
##  4 s = -0.05% diploid                                   200
##  5 s = -0.1%  haplodiploid                              200
##  6 s = -0.1%  diploid                                   200
##  7 s = -0.3%  haplodiploid                                7
##  8 s = -0.3%  diploid                                    12
##  9 s = -1%    haplodiploid                                0
## 10 s = -1%    diploid                                     0

For the simulation where s = -0.3%, these simulation runs had one or more mutations becoming fixed.

## # A tibble: 19 × 4
##    selection ploidy       simulation_id number_fixed
##    <fct>     <fct>                <dbl>        <dbl>
##  1 s = -0.3% diploid                  1            1
##  2 s = -0.3% diploid                100            1
##  3 s = -0.3% diploid                112            1
##  4 s = -0.3% diploid                122            1
##  5 s = -0.3% diploid                171            1
##  6 s = -0.3% diploid                172            1
##  7 s = -0.3% diploid                194            1
##  8 s = -0.3% diploid                196            1
##  9 s = -0.3% diploid                 25            1
## 10 s = -0.3% diploid                 34            1
## 11 s = -0.3% diploid                 59            1
## 12 s = -0.3% diploid                 66            1
## 13 s = -0.3% haplodiploid            10            1
## 14 s = -0.3% haplodiploid           114            1
## 15 s = -0.3% haplodiploid           117            1
## 16 s = -0.3% haplodiploid           120            1
## 17 s = -0.3% haplodiploid           181            1
## 18 s = -0.3% haplodiploid           190            1
## 19 s = -0.3% haplodiploid            80            2

Simulation with varying levels of dominance coefficient

For this, we ran simulation with the following parameters: * Larger genome of 1e6 loci * Smaller mutation rate of 1e-8 * Larger population of 2000

##   selection dominance haplodiploid_rate diploid_rate difference_per
## 1  s = 0.1%    h = 0%        0.02924986   0.02212743            32%
## 2  s = 0.1%   h = 25%        0.03339200   0.02967186            13%
## 3  s = 0.1%   h = 50%        0.03802357   0.03874143            -2%
## 4  s = 0.1%   h = 75%        0.04283343   0.04920943           -13%
## 5  s = 0.1%  h = 100%        0.04813114   0.06043971           -20%
##   dominance selection statistic      p.value
## 1    h = 0%  s = 0.1%       0.0 4.789695e-67
## 2  h = 100%  s = 0.1%   40000.0 4.807457e-67
## 3   h = 25%  s = 0.1%      77.5 1.531093e-66
## 4   h = 50%  s = 0.1%   27872.0 9.840289e-12
## 5   h = 75%  s = 0.1%   40000.0 4.798331e-67
##                                              method alternative
## 1 Wilcoxon rank sum test with continuity correction   two.sided
## 2 Wilcoxon rank sum test with continuity correction   two.sided
## 3 Wilcoxon rank sum test with continuity correction   two.sided
## 4 Wilcoxon rank sum test with continuity correction   two.sided
## 5 Wilcoxon rank sum test with continuity correction   two.sided
## # A tibble: 10 × 6
## # Groups:   ploidy, dominance, selection [10]
##    ploidy       dominance selection statistic p.value method                    
##    <fct>        <fct>     <fct>         <dbl>   <dbl> <chr>                     
##  1 diploid      h = 0%    s = 0.1%      0.995  0.763  Shapiro-Wilk normality te…
##  2 diploid      h = 100%  s = 0.1%      0.995  0.727  Shapiro-Wilk normality te…
##  3 diploid      h = 25%   s = 0.1%      0.991  0.225  Shapiro-Wilk normality te…
##  4 diploid      h = 50%   s = 0.1%      0.989  0.137  Shapiro-Wilk normality te…
##  5 diploid      h = 75%   s = 0.1%      0.992  0.368  Shapiro-Wilk normality te…
##  6 haplodiploid h = 0%    s = 0.1%      0.992  0.357  Shapiro-Wilk normality te…
##  7 haplodiploid h = 100%  s = 0.1%      0.995  0.805  Shapiro-Wilk normality te…
##  8 haplodiploid h = 25%   s = 0.1%      0.992  0.310  Shapiro-Wilk normality te…
##  9 haplodiploid h = 50%   s = 0.1%      0.988  0.0896 Shapiro-Wilk normality te…
## 10 haplodiploid h = 75%   s = 0.1%      0.994  0.597  Shapiro-Wilk normality te…

Varying levels of dominance coefficient with the same Ne

Haplodiploid populations have 1.5N chromosomes in the population, whereas diploid populations have 2N chromosomes. We re-ran the haplodiploid simulations with a larger population size (K = 2667).

##   dominance haplodiploid  diploid haplodiploid_rate diploid_rate
## 1    h = 0%     1238.060  774.460        0.03537314   0.02212743
## 2  h = 100%     2188.975 2115.390        0.06254214   0.06043971
## 3   h = 25%     1449.430 1038.515        0.04141229   0.02967186
## 4   h = 50%     1680.310 1355.950        0.04800886   0.03874143
## 5   h = 75%     1925.360 1722.330        0.05501029   0.04920943
##   rate_difference difference_per selection statistic      p.value
## 1     0.013245714            60%  s = 0.1%       0.0 4.791717e-67
## 2     0.002102429             3%  s = 0.1%    4869.5 3.908280e-39
## 3     0.011740429            40%  s = 0.1%       0.0 4.788953e-67
## 4     0.009267429            24%  s = 0.1%       0.0 4.797994e-67
## 5     0.005800857            12%  s = 0.1%       3.0 5.017313e-67
##                                              method alternative
## 1 Wilcoxon rank sum test with continuity correction   two.sided
## 2 Wilcoxon rank sum test with continuity correction   two.sided
## 3 Wilcoxon rank sum test with continuity correction   two.sided
## 4 Wilcoxon rank sum test with continuity correction   two.sided
## 5 Wilcoxon rank sum test with continuity correction   two.sided

Make figures for manuscript

We remake the deleterious figure, as we do not want to include simulations with strongly deleterious mutations, in which almost no simulation had any fixed mutation.

## quartz_off_screen 
##                 2
## quartz_off_screen 
##                 2
## quartz_off_screen 
##                 2

Make table for Supplementary

Wright-Fisher versus non-Wright-Fisher

We want to show that the non-Wright-Fisher simulations of diploid populations are equivalent Wright-Fisher simulations. We re-ran the simulations of adaptive alleles using a simple Wright-Fisher model.

##   selection statistic   p.value
## 1  s = 0.1%   21727.5 0.1352004
## 2  s = 0.3%   19291.5 0.5402583
## 3    s = 1%   20149.0 0.8977918
## 4    s = 0%   20073.0 0.9499901
##                                              method alternative
## 1 Wilcoxon rank sum test with continuity correction   two.sided
## 2 Wilcoxon rank sum test with continuity correction   two.sided
## 3 Wilcoxon rank sum test with continuity correction   two.sided
## 4 Wilcoxon rank sum test with continuity correction   two.sided

Test with ci

## quartz_off_screen 
##                 2